R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.2 compiler_4.3.2 fastmap_1.1.1 cli_3.6.1
[5] tools_4.3.2 htmltools_0.5.7 rstudioapi_0.15.0 yaml_2.3.7
[9] rmarkdown_2.25 knitr_1.45 jsonlite_1.8.7 xfun_0.41
[13] digest_0.6.33 rlang_1.1.2 evaluate_0.23
Q1. Git/GitHub
No handwritten homework reports are accepted for this course. We work with Git and GitHub. Efficient and abundant use of Git, e.g., frequent and well-documented commits, is an important criterion for grading your homework.
Apply for the Student Developer Pack at GitHub using your UCLA email. You’ll get GitHub Pro account for free (unlimited public and private repositories).
Create a private repository biostat-203b-2024-winter and add Hua-Zhou and TA team (Tomoki-Okuno for Lec 1; jonathanhori and jasenzhang1 for Lec 80) as your collaborators with write permission.
Top directories of the repository should be hw1, hw2, … Maintain two branches main and develop. The develop branch will be your main playground, the place where you develop solution (code) to homework problems and write up report. The main branch will be your presentation area. Submit your homework files (Quarto file qmd, html file converted by Quarto, all code and extra data sets to reproduce results) in the main branch.
After each homework due date, course reader and instructor will check out your main branch for grading. Tag each of your homework submissions with tag names hw1, hw2, … Tagging time will be used as your submission time. That means if you tag your hw1 submission after deadline, penalty points will be deducted for late submission.
After this course, you can make this repository public and use it to demonstrate your skill sets on job market.
Answer: I have finished all the steps above.
Q2. Data ethics training
This exercise (and later in this course) uses the MIMIC-IV data v2.2, a freely accessible critical care database developed by the MIT Lab for Computational Physiology. Follow the instructions at https://mimic.mit.edu/docs/gettingstarted/ to (1) complete the CITI Data or Specimens Only Research course and (2) obtain the PhysioNet credential for using the MIMIC-IV data. Display the verification links to your completion report and completion certificate here. You must complete Q2 before working on the remaining questions. (Hint: The CITI training takes a few hours and the PhysioNet credentialing takes a couple days; do not leave it to the last minute.)
Make the MIMIC v2.2 data available at location ~/mimic.
Answer: I have downloaded the MIMIC v2.2 data and put them in the folder ~/mimic. The data files are not put into Git. The data files are not copied into my directory. The gz data files are not decompressed. The following bash command displays the contents in the folder ~/mimic.
ls-l ~/mimic/
total 48
-rw-rw-r--@ 1 zihengzhang staff 13332 Jan 5 2023 CHANGELOG.txt
-rw-rw-r--@ 1 zihengzhang staff 2518 Jan 5 2023 LICENSE.txt
-rw-rw-r--@ 1 zihengzhang staff 2884 Jan 6 2023 SHA256SUMS.txt
drwxr-xr-x@ 24 zihengzhang staff 768 Jan 13 12:28 hosp
drwxr-xr-x@ 11 zihengzhang staff 352 Jan 13 12:28 icu
Refer to the documentation https://physionet.org/content/mimiciv/2.2/ for details of data files. Please, do not put these data files into Git; they are big. Do not copy them into your directory. Do not decompress the gz data files. These create unnecessary big files and are not big-data-friendly practices. Read from the data folder ~/mimic directly in following exercises.
Use Bash commands to answer following questions.
Display the contents in the folders hosp and icu using Bash command ls -l. Why are these data files distributed as .csv.gz files instead of .csv (comma separated values) files? Read the page https://mimic.mit.edu/docs/iv/ to understand what’s in each folder.
Answer: The data files are distributed as .csv.gz files instead of .csv files because the .csv.gz files are compressed and take up less storage space. These compressed files can be transferred more quickly over networks and are easier to manage when it comes to storage and backup processes. The .csv.gz files are compressed using the gzip command.
The following bash command displays the contents in the folders hosp and icu. The hosp folder contains all data acquired from the hospital wide electronic health record. Information covered includes patient and admission information, laboratory measurements, microbiology, medication administration, and billed diagnoses.
ls-l ~/mimic/hosp/
total 8859752
-rw-rw-r--@ 1 zihengzhang staff 15516088 Jan 5 2023 admissions.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 427468 Jan 5 2023 d_hcpcs.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 859438 Jan 5 2023 d_icd_diagnoses.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 578517 Jan 5 2023 d_icd_procedures.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 12900 Jan 5 2023 d_labitems.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 25070720 Jan 5 2023 diagnoses_icd.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 7426955 Jan 5 2023 drgcodes.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 508524623 Jan 5 2023 emar.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 471096030 Jan 5 2023 emar_detail.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 1767138 Jan 5 2023 hcpcsevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 1939088924 Jan 5 2023 labevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 96698496 Jan 5 2023 microbiologyevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 36124944 Jan 5 2023 omr.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 2312631 Jan 5 2023 patients.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 398753125 Jan 5 2023 pharmacy.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 498505135 Jan 5 2023 poe.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 25477219 Jan 5 2023 poe_detail.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 458817415 Jan 5 2023 prescriptions.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 6027067 Jan 5 2023 procedures_icd.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 122507 Jan 5 2023 provider.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 6781247 Jan 5 2023 services.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 36158338 Jan 5 2023 transfers.csv.gz
The icu folder contains information collected from the clinical information system used within the ICU. Documented data includes intravenous administrations, ventilator settings, and other charted items.
ls-l ~/mimic/icu/
total 6155968
-rw-rw-r--@ 1 zihengzhang staff 35893 Jan 5 2023 caregiver.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 2467761053 Jan 5 2023 chartevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 57476 Jan 5 2023 d_items.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 45721062 Jan 5 2023 datetimeevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 2614571 Jan 5 2023 icustays.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 251962313 Jan 5 2023 ingredientevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 324218488 Jan 5 2023 inputevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 38747895 Jan 5 2023 outputevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 20717852 Jan 5 2023 procedureevents.csv.gz
Briefly describe what Bash commands zcat, zless, zmore, and zgrep do.
Answer:
The zcat command is used to display the contents of a compressed file without decompressing them.
The zless command is used to display the contents of a compressed file one page at a time, and if we scroll down or click down button, we can see the next line.
The zmore command is used to display the contents of a compressed file one page at a time, and if we scroll down or click down button, we can see the next page.
The zgrep command is used to search for a specified pattern in a compressed file.
(Looping in Bash) What’s the output of the following bash script?
for datafile in ~/mimic/hosp/{a,l,pa}*.gzdols-l$datafiledone
-rw-rw-r--@ 1 zihengzhang staff 15516088 Jan 5 2023 /Users/zihengzhang/mimic/hosp/admissions.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 1939088924 Jan 5 2023 /Users/zihengzhang/mimic/hosp/labevents.csv.gz
-rw-rw-r--@ 1 zihengzhang staff 2312631 Jan 5 2023 /Users/zihengzhang/mimic/hosp/patients.csv.gz
Answer: The output of the above bash script is the detailed information about each file with the specified pattern in the ~/mimic/hosp/ directory. The pattern {a,l,pa}*.gz means it will match files starting with “a”, “l”, or “pa” and ending with .gz.
Display the number of lines in each data file using a similar loop. (Hint: combine linux commands zcat < and wc -l.)
Answer: The following bash script displays the number of lines in each data file in the ~/mimic/hosp/ directory. admissions.csv.gz has 431232 lines. labevents.csv.gz has 118171368 lines. patients.csv.gz has 299713 lines. All three include the header line.
for datafile in ~/mimic/hosp/{a,l,pa}*.gzdoecho$datafilezcat<$datafile|wc-ldone
Display the first few lines of admissions.csv.gz. How many rows are in this data file? How many unique patients (identified by subject_id) are in this data file? Do they match the number of patients listed in the patients.csv.gz file? (Hint: combine Linux commands zcat <, head/tail, awk, sort, uniq, wc, and so on.)
Answer: The following bash script displays the first five lines of admissions.csv.gz. There are 431232 rows in this data file, including the header line. There are 180733 unique patients in this data file. It does not match the number of patients listed in the patients.csv.gz file, 299712.
What are the possible values taken by each of the variable admission_type, admission_location, insurance, and race? Also report the count for each unique value of these variables. (Hint: combine Linux commands zcat, head/tail, awk, uniq -c, wc, and so on; skip the header line.)
Answer: The following bash script displays the possible values and the count for each unique value of the variable admission_type. There are 9 unique values for the variable admission_type, not including the header line. The counts for each unique value are as follows.
9
6626 AMBULATORY OBSERVATION
19554 DIRECT EMER.
18707 DIRECT OBSERVATION
10565 ELECTIVE
94776 EU OBSERVATION
149413 EW EMER.
52668 OBSERVATION ADMIT
34231 SURGICAL SAME DAY ADMISSION
44691 URGENT
Answer: The following bash script displays the possible values and the count for each unique value of the variable admission_location. There are 11 unique values for the variable admission_location, not including the header line. The counts for each unique value are as follows.
11
185 AMBULATORY SURGERY TRANSFER
10008 CLINIC REFERRAL
232595 EMERGENCY ROOM
359 INFORMATION NOT AVAILABLE
4205 INTERNAL TRANSFER TO OR FROM PSYCH
5479 PACU
114963 PHYSICIAN REFERRAL
7804 PROCEDURE SITE
35974 TRANSFER FROM HOSPITAL
3843 TRANSFER FROM SKILLED NURSING FACILITY
15816 WALK-IN/SELF REFERRAL
Answer: The following bash script displays the possible values and the count for each unique value of the variable insurance. There are 3 unique values for the variable insurance, not including the header line. The counts for each unique value are as follows.
Answer: The following bash script displays the possible values and the count for each unique value of the variable race. There are 33 unique values for the variable race, not including the header line. The counts for each unique value are as follows.
33
919 AMERICAN INDIAN/ALASKA NATIVE
6156 ASIAN
1198 ASIAN - ASIAN INDIAN
5587 ASIAN - CHINESE
506 ASIAN - KOREAN
1446 ASIAN - SOUTH EAST ASIAN
2530 BLACK/AFRICAN
59959 BLACK/AFRICAN AMERICAN
4765 BLACK/CAPE VERDEAN
2704 BLACK/CARIBBEAN ISLAND
7754 HISPANIC OR LATINO
437 HISPANIC/LATINO - CENTRAL AMERICAN
639 HISPANIC/LATINO - COLUMBIAN
500 HISPANIC/LATINO - CUBAN
4383 HISPANIC/LATINO - DOMINICAN
1330 HISPANIC/LATINO - GUATEMALAN
536 HISPANIC/LATINO - HONDURAN
665 HISPANIC/LATINO - MEXICAN
8076 HISPANIC/LATINO - PUERTO RICAN
892 HISPANIC/LATINO - SALVADORAN
560 MULTIPLE RACE/ETHNICITY
386 NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER
15102 OTHER
1761 PATIENT DECLINED TO ANSWER
1510 PORTUGUESE
505 SOUTH AMERICAN
1603 UNABLE TO OBTAIN
10668 UNKNOWN
272932 WHITE
1103 WHITE - BRAZILIAN
1170 WHITE - EASTERN EUROPEAN
7925 WHITE - OTHER EUROPEAN
5024 WHITE - RUSSIAN
To compress, or not to compress. That’s the question. Let’s focus on the big data file labevents.csv.gz. Compare compressed gz file size to the uncompressed file size. Compare the run times of zcat < ~/mimic/labevents.csv.gz | wc -l versus wc -l labevents.csv. Discuss the trade off between storage and speed for big data files. (Hint: gzip -dk < FILENAME.gz > ./FILENAME. Remember to delete the large labevents.csv file after the exercise.)
Answer: The following bash script compares compressed gz file size to the uncompressed file size. The compressed gz file size is 1.8G. The uncompressed file size is 13G, which is much larger than the compressed gz file size.
Answer: The following bash script compares the run times of zcat < ~/mimic/labevents.csv.gz | wc -l versus wc -l labevents.csv. The run time of zcat < ~/mimic/labevents.csv.gz | wc -l is around 13.914s. The run time of wc -l labevents.csv is around 15.861s.
time zcat < ~/mimic/hosp/labevents.csv.gz |wc-l
time wc -l labevents.csv
Answer: In theory, the trade off between storage and speed for big data files is that compressed files take up less storage space but take longer to run. Uncompressed files take up more storage space but take less time to run. However, in my computer, the run time of zcat < ~/mimic/labevents.csv.gz | wc -l is shorter than the run time of wc -l labevents.csv.
Answer: Finally, delete the large labevents.csv file.
rm labevents.csv
Q4. Who’s popular in Price and Prejudice
You and your friend just have finished reading Pride and Prejudice by Jane Austen. Among the four main characters in the book, Elizabeth, Jane, Lydia, and Darcy, your friend thinks that Darcy was the most mentioned. You, however, are certain it was Elizabeth. Obtain the full text of the novel from http://www.gutenberg.org/cache/epub/42671/pg42671.txt and save to your local folder.
Explain what wget -nc does. Do not put this text file pg42671.txt in Git. Complete the following loop to tabulate the number of times each of the four characters is mentioned using Linux commands.
Answer:wget -nc downloads the file from the URL if it does not exist in the current directory. If the file already exists in the current directory, wget -nc does not download the file from the URL and will show that “File already there; not retrieving”.
wget-nc http://www.gutenberg.org/cache/epub/42671/pg42671.txtfor char in Elizabeth Jane Lydia Darcydoecho$char:grep-o-i$char pg42671.txt |wc-ldone
Answer: Elizabeth is mentioned 634 times, Jane is mentioned 293 times, Lydia is mentioned 171 times, and Darcy is mentioned 418 times. Therefore, Elizabeth is the most mentioned character in the book.
What’s the difference between the following two commands?
echo'hello, world'> test1.txt
and
echo'hello, world'>> test2.txt
Answer: The first command echo 'hello, world' > test1.txt writes the string “hello, world” to the file test1.txt. If the file test1.txt already exists, the first command will overwrite the file test1.txt. The second command echo 'hello, world' >> test2.txt appends the string “hello, world” to the file test2.txt. If the file test2.txt already exists, the second command will append the string “hello, world” to the end of the file test2.txt without overwriting its existing content.
Using your favorite text editor (e.g., vi), type the following and save the file as middle.sh:
#!/bin/sh# Select lines from the middle of a file.# Usage: bash middle.sh filename end_line num_lineshead-n"$2""$1"|tail-n"$3"
chmod u+x ./middle.sh
Using chmod to make the file executable by the owner, and run
./middle.sh pg42671.txt 20 5
Editor: R. W. Chapman
Release date: May 9, 2013 [eBook #42671]
Language: English
Explain the output. Explain the meaning of "$1", "$2", and "$3" in this shell script. Why do we need the first line of the shell script?
Answer: The output of ./middle.sh pg42671.txt 20 5 is the 5 lines from line 16 to line 20 of the file pg42671.txt. "$1" is the first argument of the shell script, which is the file name pg42671.txt. "$2" is the second argument of the shell script, which is the end line number 20. "$3" is the third argument of the shell script, which is the number of lines 5. We need the first line of the shell script #!/bin/sh to tell the shell which interpreter to use to run the shell script.
Q5. More fun with Linux
Try following commands in Bash and interpret the results: cal, cal 2024, cal 9 1752 (anything unusual?), date, hostname, arch, uname -a, uptime, who am i, who, w, id, last | head, echo {con,pre}{sent,fer}{s,ed}, time sleep 5, history | tail.
cal
January 2024
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 _2_5 26 27
28 29 30 31
Answer: The cal command displays a calendar for the current month.
cal 2024
2024
January February March
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 1 2 3 1 2
7 8 9 10 11 12 13 4 5 6 7 8 9 10 3 4 5 6 7 8 9
14 15 16 17 18 19 20 11 12 13 14 15 16 17 10 11 12 13 14 15 16
21 22 23 24 _2_5 26 27 18 19 20 21 22 23 24 17 18 19 20 21 22 23
28 29 30 31 25 26 27 28 29 24 25 26 27 28 29 30
31
April May June
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 1 2 3 4 1
7 8 9 10 11 12 13 5 6 7 8 9 10 11 2 3 4 5 6 7 8
14 15 16 17 18 19 20 12 13 14 15 16 17 18 9 10 11 12 13 14 15
21 22 23 24 25 26 27 19 20 21 22 23 24 25 16 17 18 19 20 21 22
28 29 30 26 27 28 29 30 31 23 24 25 26 27 28 29
30
July August September
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 1 2 3 1 2 3 4 5 6 7
7 8 9 10 11 12 13 4 5 6 7 8 9 10 8 9 10 11 12 13 14
14 15 16 17 18 19 20 11 12 13 14 15 16 17 15 16 17 18 19 20 21
21 22 23 24 25 26 27 18 19 20 21 22 23 24 22 23 24 25 26 27 28
28 29 30 31 25 26 27 28 29 30 31 29 30
October November December
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 1 2 1 2 3 4 5 6 7
6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12 13 14
13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19 20 21
20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26 27 28
27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
Answer: The cal 2024 command displays a calendar for the year 2024.
cal 9 1752
September 1752
Su Mo Tu We Th Fr Sa
1 2 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
Answer: The cal 9 1752 command displays a calendar for the month of September in the year 1752. The calendar for the month of September in the year 1752 is unusual because the calendar for the month of September in the year 1752 is missing 11 days, from 3rd to 13 th. The calendar for the month of September in the year 1752 is missing 11 days because the British Empire and its colonies switched from the Julian calendar to the Gregorian calendar in September 1752. The Julian calendar was 11 days behind the Gregorian calendar. Therefore, the calendar for the month of September in the year 1752 is missing 11 days.
date
Thu Jan 25 18:09:46 PST 2024
Answer: The date command displays the current date and time.
hostname
zzhMac.local
Answer: The hostname command displays the name of the current host system. It is used to obtain the DNS (Domain Name System) name and set the system’s hostname or NIS (Network Information System) domain name.
arch
i386
Answer: The arch command displays the architectural information about the computer.
uname-a
Darwin zzhMac.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:27 PDT 2023; root:xnu-10002.41.9~6/RELEASE_X86_64 x86_64
Answer: The uname -a command displays the the name, version and other details about the operating system and the hardware. Paramater -a means reveal all the information.
Answer: The uptime command displays the current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
who am i
zihengzhang Jan 25 18:09
Answer: The who am i command displays the login information of the current user.
who
zihengzhang console Jan 8 22:43
Answer: The who command displays the login information of users who log in to the UNIX or Linux operating system.
w
18:09 up 16 days, 19:27, 1 user, load averages: 3.02 2.20 1.95
USER TTY FROM LOGIN@ IDLE WHAT
zihengzhang console - 08Jan24 16days -
Answer: The w command displays the login information of all users and what they are doing.
Answer: The id command displays the current user’s user and group IDs. “uid” is the user ID and it is a unique identification in the system. “gid” is the group ID and it is the a unique identification that represents the primary group to which the user belongs. “groups” is additional groups and it is a unique identification of other additional groups to which the user belongs.
last|head
zihengzhang ttys001 Wed Jan 24 16:29 - 16:29 (00:00)
zihengzhang ttys001 Wed Jan 24 16:23 - 16:23 (00:00)
zihengzhang ttys001 Wed Jan 24 16:14 - 16:14 (00:00)
zihengzhang ttys001 Wed Jan 24 16:02 - 16:02 (00:00)
zihengzhang ttys001 Wed Jan 24 16:01 - 16:01 (00:00)
zihengzhang ttys001 Wed Jan 24 16:00 - 16:00 (00:00)
zihengzhang ttys000 Mon Jan 22 10:42 - 10:42 (00:00)
zihengzhang ttys001 Sat Jan 20 20:48 - 20:48 (00:00)
zihengzhang ttys000 Thu Jan 18 18:40 - 18:40 (00:00)
zihengzhang ttys000 Thu Jan 18 18:24 - 18:24 (00:00)
Answer: The last | head command displays the last 10 users logged on.
Answer: The echo {con,pre}{sent,fer}{s,ed} command gives us the combination of all the possible words. It selects the contents in curly brackets and groups them together. Each time, it selects one word from each curly bracket and combines them. Then, it displays all the possible combinations of the words in curly brackets.
time sleep 5
real 0m5.007s
user 0m0.001s
sys 0m0.002s
Answer: The time sleep 5 command displays the time it takes to run the command sleep 5, which is 5 seconds. sleep command is used to delay the next command execution in the script for a fixed amount of time.
history|tail
Answer: The history | tail command displays the last 10 commands that were run in the bash shell.
Open the project by clicking rep-res-3rd-edition.Rproj and compile the book by clicking Build Book in the Build panel of RStudio. (Hint: I was able to build git_book and epub_book but not pdf_book.)
The point of this exercise is (1) to get the book for free and (2) to see an example how a complicated project such as a book can be organized in a reproducible way.
For grading purpose, include a screenshot of Section 4.1.5 of the book here.